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Adequate  prediction  of  a response  variable  using  a multiple  linear 


regression  model  is  shown  in  this  article  to  be  related  to  the 


of  multicoll inear i ties  among  the  predictor  variables.  If  strong  multi 


determine  when  prediction  is  likely  to  be  accurate.  A region  of  predie 


tion,  R,  is  proposed  as  a guide  for  prediction  purposes.  This  region  is 
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related  to  a prediction  interval  when  the  matrix  of  predictor  variables 
is  of  full  column  rank,  but  it  can  also  be  used  when  the  sample  is  under- 
sized. The  Gorman-Toman  (1966)  ten  variable  data  is  used  to  illustrate 
the  effectiveness  of  the  region  R»^_ 

X 

1.  INTRODUCTION 

Prediction  of  future  observations  is  one  of  the  primary  uses  of  an 
estimated  linear  regression  model.  Although  a large  number  of  papers  and 
books  have  been  written  on  the  analysis  of  regression  data,  the  emphasis 
in  the  literature  is  heavily  weighted  toward  problems  of  model  building 
and  estimation  of  model  parameters,  and  not  on  recommendations  for  using 
prediction  equations.  While  these  problems  are  all  related,  they  do  not 
necessarily  place  the  same  demands  on  the  estimated  model. 
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Currently  much  of  the  statistical  literature  on  linear  regression  is 
focussing  on  properties  of  biased  regression  estimators.  Notable  articles 
include  James  and  Stein  (1961),  Hoerl  and  Kennard  (1970),  Marquardt  (1970), 
Lindley  and  Smith  (1972),  Hawkins  (1973),  and  Webster,  Gunst,  and  Mason 
(1974).  Biased  estimation  is  receiving  such  prominence  due  to  the  reali- 
zation that  multicollinearity  among  the  predictor  variables  (defined  in 
Section  2)  tends  to  severely  distort  the  least  squares  estimates  of  the 
regression  parameters.  This  in  turn  can  result  in  poor  prediction  of 
future  responses.  Subset  selection  procedures  likewise  are  not  immune  to 
distortion  in  the  presence  of  multicollinear  data. 

Underlying  this  need  for  good  parameter  estimates  is  the  assumption 
that  the  fitted  model  is  to  be  used  to  predict  over  a wide  region  of  in- 


terest of  the  predictor  variables,  perhaps  an  entire  rectangular  region 
defined  by  the  extreme  values  observed  on  each  predictor  variable.  This 


may  be  unduly  stringent  assumption  as  Hocking  (1976)  discusses  from  a 
variable  selection  viewpoint.  In  other  words,  frequently  it  is  not 
necessary  to  predict  over  such  a wide  region.  When  this  is  so,  accurate 
predictions  may  be  possible  despite  uncertainties  about  the  goodness  of 
individual  parameter  estimates. 

The  purpose  of  this  paper  is  to  better  identify  when  prediction  is 
likely  to  be  accurate  with  multicollinear  data.  Specifically,  this  paper 
was  stimulated  by  three  problems  noticed  by  Owen  and  Reynolds  (1968)  in 


their  development  of  a prediction  equation  for  estimating  engineering  man- 
hours for  proposed  aircraft  programs: 


1)  they  decided  to  include  "no  more  than  12"  predictor  variables 

from  a total  of  about  60  possible  ones  since  only  23  observations 
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on  the  response  variable  (engineering  man-hours)  were  available 


2)  a backward  elimination  (Draper  and  Smith  (1966),  Chapter  6) 


procedure  was  performed  to  further  reduce  the  number  of  pre 


interaction  with  other  variables 


3)  the  authors  concluded  that  "some  limits  of  extrapolation  for 


formulas  should  be  a primary  objective  of  future  studies 


We  will  address  each  of  these  problems  in  subsequent  sections,  not  with 


the  goal  of  providing  final,  definitive  solutions  to  them,  but  rather  to 


show  how  each  affects  the  estimation  of  the  regression  parameters  and  the 


use  of  the  resulting  prediction  equation.  We  do  not  intend  to  argue  that 


any  particular  estimator  is  the  best  one  to  use  with  multicollinear  data 


We  will,  however,  point  out  some  advantages  of  using  a principal  component 


estimator  to  obtain  a prediction  equation 


2.  LEAST  SQUARES  PREDICTION  EQUATIONS 


In  this  section  we  will  examine  problems  2)  and  3)  of  Owen  and  Reynolds 


(1968)  which  were  listed  in  the  previous  section.  Suppose  the  assumed  lin 


ear  regression  model  is  written  as 


where  Y is  an  (nxl)  vector  of  observations  on  the  response  (dependent) 


full  column  rank  matrix  of  predictor  (independent)  variables,  fi  is  an 


unknown  constant,  8 is  a (pxl)  vector  of  unknown  regression  parameters 


and  e is  an  (nxl)  vector  of  unobservable  random  error  terms  with 


rrr#*?' 
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C 'v  N(0,a  I).  For  simplicity,  we  assume  that  the  elements  of  X are 

“ -1 

standardized  so  that  x!l  = 0 and  X^X^.  = 1 for  3 = l,2,...,p.  Finally, 
model  (1)  is  assumed  to  adequately  represent  the  response  variable  although 
some  of  the  predictor  variables  may  not  be  needed  for  adequate  prediction. 

The  latter  two  problems  cited  by  Owen  and  Reynolds  result  from  inade- 
quacies in  the  data  used  to  estimate  the  model  parameters;  in  particular, 
from  multicollinearities  in  the  data.  A multicollinearity  can  be  defined 
as  a linear  combination  of  the  columns  of  X that  is  nearly  zero.  This  im- 
plies that  X'X  is  nearly  singular.  A multicollinearity  is  not  necessarily 
due  to  some  variables  being  redundant  in  the  specification  of  the  model, 
but  they  may  be  redundant  for  the  data  collected . 

Redundant  model  variables,  those  variables  that  will  be  redundant 
for  all  samples  of  data,  can  and  should  be  deleted  from  the  model  since 
they  serve  only  to  inflate  the  variance  of  predicted  responses  (see,  e.g., 
Hocking  (1976) ) . If  the  redundancy  is  inherent  only  in  the  particular 
data  sampled,  it  is  dangerous  to  remove  them  from  the  predictor  since 
the  estimated  model  may  then  be  biased  when  future  responses  are  predicted. 
Yet  multicollinearities  tend  to  cause  the  deletion  of  one  or  more  of  the 
multicollinear  variables  merely  because  they  are  involved  in  multicollin- 
earities, not  because  they  are  worthless  predictor  variables. 

To  see  this  latter  point,  denote  the  eigenvalues  of  X'X  by  ^^^2- 
and  the  corresponding  eigenvectors  by  v1'v2' ‘ ' ,Vp'  there  are 

one  or  more  multicollinearities  among  the  columns  of  X,  one  or  more  of  the 
eigenvalues  of  X'X  will  be  nearly  zero.  For  eigenvalues  that  are  near 
zero,  multicollinearities  can  be  identified  by  noting  that 


•ttr -aoflifc*',* 


V'.X'XV.  = £.  **  0 =>  y V.  .X.  % 0. 

"3  -3  3 i“1  13-3 


Equation  (2)  shows  that  the  eigenvector  V corresponding  to  a small  eigen- 
value 5,^  provides  the  coefficients  for  the  linear  combination  of  the  col- 
umns of  X causing  a multicollinearity . Naturally,  the  larger  elements  in 


V_.  identify  the  predictor  variables  most  strongly  multicollinear . Mason, 
et  al.  (1975)  contains  a more  complete  discussion  of  multicollinearities 
and  the  problems  associated  with  them. 

The  least  squares  estimator  of  B for  the  model  specified  by  (1)  is 

g = (X 1 X) ~1X ' Y . 

The  variances  and  covariances  of  the  B^  can  be  found  from 


Var[B]  = (X'X)_1a^  = T fVv'.o2 
— U 1 — ■ *1 


j = l 3 


From  (3)  we  can  see  that  small  eigenvalues  in  X'X  will  result  in  large 
variances  and  covariances  for  estimated  parameters  of  variables  involved 


in  multicollinearities  (those  with  large  V. . values  in  (2)). 

13 

When  attempting  to  reduce  the  number  of  variables  in  the  prediction 


equation,  the  t statistic  commonly  used  to  test  HQ:  |5  . = 0 is 

t = B ./(c.  . MSE) 1/2 , (4) 

3 33 

where  c..  is  the  jth  diagonal  element  of  (X'X)  1 and  MSE  is  the  estimate 
33 

2 

of  o computed  from  the  full  model  (1).  Since  the  c.  . values  of  variables 

33 

involved  in  multicollinearities  tend  to  be  large  due  to  the  small  £.  . in 

3 

(3),  the  t statistics  corresponding  to  these  variables  tend  to  be  small. 


This  accounts  for  the  tendency  for  variables  to  be  deleted  by  some  computer 


programs  because  of  their  "interaction  with  other  variables". 
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Contrary  to  Owen  and  Reynold's  supposition,  backward  elimination 
also  suffers  from  this  problem.  Backward  elimination  deletes  the  variable 
with  the  smallest  t statistic  at  each  stage.  Since  multicollinear  varia- 
bles tend  to  have  small  t statistics,  at  least  one  multicollinear  predic- 
tor variable  is  likely  to  be  deleted  from  the  model.  See  Gunst,  et  al. 
(1976)  for  an  illustration  of  this  property. 

The  problem  of  eliminating  important  predictor  variables,  problem 
2)  of  the  previous  section,  is  thus  directly  related  to  multicollinearities 
in  the  data.  Multicollinearities  in  the  data  used  to  estimate  3 may  affect 
prediction  even  if  all  the  predictor  variables  are  used  in  the  prediction 
equation.  Write  the  least  squares  prediction  equation  as 

* = 30  + u'g,  (5) 

where  0g  = Y and  u is  a vector  of  values  of  the  p predictor  values  which 
are  standardized  as  in  (1). 

Since  Y generally  estimates  well,  (5)  will  be  an  adequate  predic- 
tor of  the  response  if 

u'g  % u'g 

for  all  values  of  u in  some  region  of  interest.  Now  u'3  is  an  unbiased 
estimator  of  u'8,  with  variance 

Var[u'g]  = o'  u' (X'X)  *u 

2 P -i  2 

= C 7 i.  (u'V.)  . (6) 

j-i  ] ‘ 

It  can  be  seen  from  (6)  that  Var[u'3l  will  be  unacceptably  large  for  many 

points  u if  one  or  more  of  the  l,  are  sufficiently  small,  or  some  of  the 

D 

u'V.  are  large. 


A commonly  known  but  infrequently  used  means  of  estimating  the 
precision  of  prediction  is  by  forming  a 100(l-a)%  prediction  interval 


for  the  point  u: 


Y - tv  (a/2) • s U<Y  + t (a/2) *s,  (7) 

where  t (a/2)  is  the  upper  100 (a/2) % critical  point  of  the  t distribution 
v 

with  v = n-p-1  degrees  of  freedom,  and  s = [ (1+n  1+u'(X'X)  3u)  *MSE] 1</2 . 

P”  -1  ~ 2 

The  width  of  this  prediction  interval  depends  on  7 SL.  (u'V.)  , as  in  (6). 

j=l  3 " 3 

Both  (6)  and  (7)  essentially  depend  on  how  small  u’V  is  relative  to 
. If  is  extremely  small,  then  u’V_.  must  also  be  small  or  prediction 
will  be  poor.  These  considerations  suggest  the  definition  of  a "region 
of  predictability"  wherein  prediction  would  be  expected  to  be  suitably 
accurate.  One  such  region  can  be  defined  as 

R = {u:  |u'V.|<c.,  j=l,...,p,  and  a^u^b.),  (8) 

“ 3 i-l-i 

where  a.  and  b.  are  the  minimum  and  maximum  standardized  values  of  the  ith 

l i 

predictor  variable,  i.e.,  a.  = min{X  . ; k=l,2,...,n)  and  b.  = 

1 Kl  X 

max{X  . ; k=l,2,...,n}  for  i=l,2,...,p. 

K X 

Two  methods  for  choosing  the  c . are 

3 

...  1/2 
(l)  Cj  = fcj  , or 

(ii)  c_.  = max{]w|V^|,  i=l,  2, . . . ,n },  where  w j is  the  ith  row  of  X. 

-1  2 

Method  (i)  insures  that  £.  . (u'V.)  < 1,  while  (ii)  bounds  u'V.  by  the 

D — 3 ~ ~3 

largest  of  the  values  for  the  points  wj  used  to  estimate  8-  Each  of 
these  methods  can  be  interpreted  as  requiring  that  the  prediction  equation 
only  be  used  in  regions  for  which  data  has  been  collected;  i.e.,  the  re- 


quirements (i)  and  (ii)  limit  extrapolation.  If  one  wishes  to  predict 
outside  R,  the  predicted  values  must  be  cautiously  used,  but  this  does 


indicate  a partial  response  to  point  3)  of  Owen  and  Reynolds. 


By  far  the  worst  prediction  will  occur  for  points  which  nave  values 

of  |u'Yj|  that  are  large  for  small  values  of  % . Suppose  r multicollin- 

earities  have  been  detected  by  a careful  examination  of  the  £.  and  V., 

D -J 

as  well  as  possibly  other  procedures  such  as  investigating  the  "correla- 
tion" matrix,  X'X,  or  the  variance  inflation  factors  (Marquardt  (1970), 

Marquardt  and  Snee  (1975))-  The  c.,  a.,  and  b,  could  then  be  relaxed  in 

: i 1 

(8)  for  the  first  (p-r)  directions  u'V_j,  j=l, 2, . . . ,p-r . In  these  direc- 
tions extrapolation  could  be  allowed  with  the  knowledge  that  (7)  would 
still  provide  reasonable  bounds.  These  ideas  will  become  even  more  im- 
portant with  the  discussion  of  problem  1)  in  Section  3. 

Note  that  R is  based  solely  on  sample  information,  information  avail- 
able to  the  data  analyst  at  the  time  he  wishes  to  make  a prediction.  If 
(8)  is  not  satisfied,  prediction  may — and  sometimes  will — be  accurate 
since  (5)  is  an  unbiased  estimator  of  (3^  + u'8.  Prediction  for  u e R 
provides  the  assurance  that  the  prediction  equation  is  suitably  precise. 

If  variable  selection  procedures  are  used  to  reduce  the  number  of 
variables  in  the  model,  prediction  will  be  adequate  provided  (2)  holds 


for  the  points  at  which  prediction  is  desired.  This  implies  that  u'V_. 
for  these  new  points.  But  this  restriction  is  again  in  the  form  of  a 


'V.  £ 0 


region  (8)  with  the  c_.  chosen  suitably  small  for  j = p-r+l,p-r+2, . . . ,p. 
Thus  if  a region  of  predictability  of  the  form  (8)  is  constructed,  least 
squares  estimation  and  variable  selection  techniques  will  yield  prediction 


equations  which  are  accurate  despite  the  multicollinearities  in  the  data 


used  to  estimate  the  parameters.  Outside  this  region  the  predictor  can- 


not be  expected  to  perform  well  due  to  large  variances  of  the  predictor 
or  bias  due  to  erroneously  deleting  important  predictor  variables. 
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3.  A PRINCIPAL  COMPONENT  PREDICTOR  FOR  UNDERSIZED  SAMPLES 

Owen  and  Reynold's  first  problem,  having  to  use  only  about  12  of  60 
possible  predictor  variables  in  their  initial  models,  results  from  fewer 
observations  than  predictor  variables  being  available  for  the  analysis. 

The  full  rank  analysis  of  (1)  using  least  squares  requires  that  n>p,  a 
requirement  not  satisfied  by  their  data. 

There  is  a wide  range  of  model-building  problems  that  could  be  ad- 
dressed at  this  point  concerning  specification  of  model  (1) , but  it  is 
not  within  the  scope  of  this  paper  to  do  so.  We  merely  wish  to  raise  the 
obvious  questions  regarding  the  deletion  of  many  potentially  valuable  pre- 
dictor variables  (i)  subjectively,  (ii)  on  the  basis  of  a partial  analysis 
of  the  response  and  a subset  of  the  predictor  variables,  or  (iii)  by  using 
a stepwise  procedure  such  as  forward  selection  (see  Mantel  (1970)  for  some 
objections  to  this  technique  for  full  rank  models) . One  acceptable  means 
of  deleting  variables  prior  to  an  analysis  of  the  complete  (assumed  correct) 
model  (1)  is  if  there  are  model  redundancies. 

Rather  than  demanding  a full  rank  analysis,  generalized  inverse  esti- 
mators offer  another  option.  The  generalized  inverse  solution  is  generally 
presented  in  a discussion  of  singular  X matrices  (as  in  designed  experi- 
ments) for  which  n>p  (see,  e.g.,  Rao  (1965),  Searle  (1971),  or  Theil  (1971)). 
While  the  existence  of  this  estimator  of  6 and  its  estimability  characteris- 
tics are  well-known,  its  potential  use  with  undersized  samples  (n<p)  has 
not  been  fully  explored.  An  exception  to  this  statement  is  in  the  economic 
literature  of  simultaneous  equations  systems  (Fisher  and  Wadycki  (1971) , 
Khazzoom  (1975),  Swamy  and  Holmes  (1971)). 


, 1 


The  particular  generalized  inverse  estimator  we  will  examine  in  this 


section  is  referred  to  in  the  literature  (e.g.,  Massy  (1965),  Marquardt 


(1970))  as  a principal  components  estimator.  If  n>p  and  X has  rank  p-r 


0) , the  principal  component 


V ],  and  L = diag  ).  It  is  often  demonstrated  that  (9) 

-p-r  L 12  p-r 

is  the  least  squares  estimator  of  g subject  to  the  constraints  V'g  = 0, 


With  undersized  samples  (i.e.,  n<p)  there  are  typically  s very  small 


eigenvalues  of  X'X  in  addition  to  the  r zero  ones.  We  propose,  therefore 


a generalization  of  (9)  for  undersized  samples  which  is  of  the  same  form 


but  with  V. 


it  can  be  shown  that  this  is 


s -p-s-r+1  -p-s-r+2  -p-r 

Our  rationale  for  using  this  estimator  of  g stems  from  a different 


justification  than  the  parameter  constraints  given  above.  This  justifies 


tion  stresses  the  use  of  the  actual  information  provided  by  the  matrix  of 


predictor  variables  and,  as  we  shall  see,  again  yields  guidelines  for  the 


use  of  the  resulting  predictor 


contains  the  eigenvectors  corresponding  to  the  eigenvalues  in  L 


n 


diag (£  ,1  ),  and  H contains  the  eigenvectors  correspond- 

p-s-r+1  p-s-r+2  p-r  0 ^ c 

ing  to  the  n-p+r  zero  eigenvalues.  For  undersized  samples,  r,  the  number 
of  zero  latent  roots  of  X'X,  is  generally  equal  to  p-(n-l),  so  that 
contains  only  one  vector.  Then  we  can  write  (e.g.,  Good  (1969)) 

1/2  1/2 

X = H J>V'  + H L V'  + H L ' V' 

00  ss  s LL  L 


= X + X + X, 
0 s L 


(10) 


where  4>  is  an  (n-p+r)xr  matrix  of  zeros  and  XQ  = H t etc.  Since  XQ  = 

$ and  X^  ^ $ (since  contains  small  eigenvalues) , we  see  from  (10)  that 
% 

X n,  X . This  emphasizes  the  point  that  the  entire  space  of  predictor 

Xj 

variables  has  not  been  sampled,  only  a subspace  that  is  primarily  spanned 

by  the  eigenvectors  in  V . Inserting  X in  place  of  X in  (1)  and  obtain- 

L L 

ing  the  principal  component  solution  to  the  normal  equations  yields  (9) 

with  V defined  as  above.  This  argument  can  also  be  used  to  justify  the 
Li 

use  of  a principal  component  estimator  for  the  full  rank  model  if  multi- 

. . . 

collineanties  are  present  since,  then,  X = X + X ^ X . 

s L L 

The  principal  component  prediction  equation  for  undersized  samples, 


Y = Y + u'B, 

is  biased.  The  bias  of  (11)  can  be  written 


(11) 


8 (Y ) = u’6  - u’  (X'X  ) X’XB 

Xj  Xj  Xj 

= u'B  - u'V  V'B, 

“ “ — Li  Xj” 


(12) 


and  the  variance  of  u'B  is 


- 2 

Var[u'B]  = u'  (X'  X ) uO 

~ — Xj  Xj  “ 


2 p-s-r 

= 0 l 

i=l  3 1 


(13) 


The  variance  term  (13)  does  not  suffer  from  having  small  eigenvalues  as 
does  (6),  but  (12)  indicates  that  the  predictor  is  generally  biased.  Note 
that  if  VqU  = 0 and  V^u  = 0,  u'8  = u'V  V^B,  and  (11)  indeed  turns  out  to 
be  unbiased.  This  again  reflects  the  fact  that  prediction  should  be  ac- 
curate if  we  restrict  the  region  of  predictability  to  points  in  a general 
region  that  was  actually  sampled. 

This  discussion  suggests  a region  similar  to  (8)  within  which  predic- 
tion could  be  proposed,  but  outside  of  which  prediction  should  not  be 
recommended.  Extrapolation  can  also  be  allowed  in  the  space  spanned  by 

V . An  evaluation  of  these  recommendations  is  the  subject  of  the  next 
Li 

section. 

4.  AN  ASSESSMENT  OF  R 

In  this  section  an  example  is  presented  to  illustrate  the  potential 

benefits  of  using  a region  such  as  R as  a guide  in  predicting.  Again,  a 

prediction  interval  of  the  form  (7)  is  preferable  to  R when  X is  of  full 

column  rank  and  a sufficient  number  of  observations  are  available  to  ob- 

2 

tain  a good  estimate  of  a . Otherwise,  R can  still  be  effectively  used, 
as  is  now  demonstrated. 

The  example  concerns  the  ten  variable  data  of  Gorman  and  Toman  (1966) . 
A detailed  analysis  of  this  data,  including  a listing  of  the  raw  data,  is 
given  in  Daniel  and  Wood  (1971) . Two  analyses  of  this  data  are  to  be  per- 
formed here:  (i)  a full  rank  analysis  in  which  the  first  15  of  the  n=36 

data  points  are  used  to  obtain  a predictor,  and  (ii)  an  undersized  sample 
analysis  in  which  only  the  first  10  of  the  36  data  points  are  used.  Each 
predictor  is  then  used  to  predict  the  remaining  observations. 


With  the  full  rank  analysis 


0.0062,  with  corresponding  latent  vector 


From  the  discussion  of  Section  2,  both  (7)  and  (8)  suggest  that  prediction 


should  not  be  attempted  unless  u'V  is  small.  (For  simplicity  and  ease 


of  discussion,  we  are  only  considering  one  small  latent  root  in  this 


0.019,  we  may  wish  to  consider  the  magnitude  of 


u'V  as  well).  Using  least  squares,  a predictor  of  the  form  (5)  was 


(labeled  "RESIDUAL")  versus  |u'V 


(labeled  "VS  PRIME  U")  for  the  36-15  = 21  data  points  not  used  to  estimate 


the  parameters  in  the  prediction  equation.  The  trend  is  clear:  the  mag 


nitude  of  the  residuals  increases  with  the  magnitude  of  u'V 


While 


some  moderate-sized  residuals  do  occur  with  small  magnitudes  of  u'V 


there  are  no  small  residuals  for  large  magnitudes  of  u'V 


Also  evident  from  Figure  1 is  the  need  to  explore  possible  bounds  on 


The  two  suggested  in  Section  2 turn  out  to  be 


While  these  bounds  may  be  extremely  effective  for  new  values  of  u which 
satisfy  (i)  or  (ii) , the  smallest  value  of  |u'V  | for  the  21  additional 


that,  at  least  qualitatively,  a region  such  as  R can  be  effective  in  as 


sessing  when  prediction  should  not  be  attempted 


X 


X 


D 


.20 


.HD 

K5  PRI! 


JRE  1.  Residuals  of  Gor 
Full  Rank  Analys 


For  the  undersized  sample, 
ing  latent  vectors  are 


0 and  i = 0.0104. 
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The  correspond- 


(.221, 

.046, 

-.131, 

.064, 

.562 

-.645, 

-.092, 

.148, 

.219, 

-.342)  , 

(.661, 

-.085, 

-.367, 

-.059, 

.088 

.495, 

.290, 

.188, 

.160, 

-.139) 

The  latent  vectors  corresponding  to  the  B remaining  latent  vectors  were 
used  to  estimate  0 as  in  (9)  and  then  form  the  prediction  equation  in  (11) . 
Figure  2 is  a plot  of  the  residuals,  |y^-Y^|  < °f  the  remaining  36-10  = 26 
data  points  (with  a "+"  indicating  jy^-Y^j  < 0.75,  a "X"  indicating  0.75 < 

| Y.-Yj  < 1.50,  and  " O " indicating  1.50  < lYi-Yih  as  a function  of  |u'Vg| 
(labeled  "VS  PRIME  U")  and  | u ' v10l  (labeled  "VO  PRIME  U").  Again  the  trend 
is  clear:  smaller  residuals  occur  predominantly  with  smaller  values  of 

bo th  ju'Vg|  and  |u'V^0|. 


5 . SUMMARY 

The  intent  of  this  paper  is  to  focus  attention  on  an  aspect  of  regres- 
sion analysis  that  is  often  overlooked  when  the  resulting  prediction  equa- 
tion is  employed.  Regardless  of  the  sample  size  used  to  obtain  estimates 
of  model  parameters  (and  particularly  when  the  sample  size  is  small),  esti- 
mation is  highly  inaccurate  outside  a region  generally  defined  by  (8). 

Yet  regions  of  this  form  are  always  available  to  the  data  analyst  and  can 
be  very  valuable  as  guides  in  predicting.  The  Gorman-Toman  data  illustrates 
that  both  in  the  full  rank  situation  and  the  undersized  sample  case,  a re- 
gion R formed  by  considering  u'v^  for  latent  vectors  V_.  corresponding  to 
zero  or  small  latent  roots  of  X'X  was  effective  in  identifying  when 
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prediction  was  likely  to  be  inadequate.  Further  work  in  this  area  should 
concentrate  on  refining  R;  in  particular,  developing  reasonable  bounds, 
Cy  for  (8)  based  on  the  information  in  X. 
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